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Identification of jets originating from b quarks (b-tagging) is a key element of many physics 
analyses at the LHC. Various algorithms for b-tagging have been developed by the CMS experiment 
to identify b-tagged jets with a typical efficiency between 40% and 70% while keeping the rate of 
misidentified light quark jets between 0.1% and 10%. An important step, in order to be able to 
use these tools in physics analysis, is the determination of the efficiency for tagging b-jets. Several 
methods to measure the efficiencies of the life-time based b-tagging algorithms are presented. Events 
that have jets with muons are used to enrich a jet sample in heavy flavor content. The efficiency 
measurement relies on the transverse momentum of the muon relative to the jet axis or on solving a 
system of equations which incorporate two uncorrelated taggers. Another approach uses the number 
of b-tagged jets in top pair events to estimate the efficiency. The results obtained in 2010 data and 
the uncertainties obtained with the different techniques are reported. The rate of misidentified light 
quarks have been measured using the "negative" tagging technique. 

I. INTRODUCTION 

B tagging or the identification of b-jets is of crucial importance in event topologies involving b-quarks. Many 
standard model processes entail b-quark production in the intermediate state, for example, in top physics b- 
tagging is imperative to distinguish between signal and background processes. Higgs physics is heavily b-tagging 
dependent when the Higgs primarily decays to bb pairs at a mass of 120 GeV. Hence, for such processes, the 
efficiency of tagging b-jets is an important variable in the analysis. The CMS detector has performed remarkably 
well. There is good agreement between data and simulations. However, b-tagging is a complex tool that relies 
on many aspects of detector performance and hence it is essential to measure the b-tagging efficiency on data 
and not rely exclusively on input from simulations. 

The algorithms for b-jet identification utilize several salient features of B hadron decays. B hadrons have a 
relatively high lifetime of ~1.5 ps (cr = 450 ^m). They have a mass of ^ 5.2 GeV, which is higher than the 
mass of the light quarks. They typically tend to decay into a large number of charged particles, the average 
decay multiplicity being 5. Due to the high mass, the fragmentation is hard, hence the pr of decay products 
is high. The semi-leptonic branching ratio of B hadrons is ~ 11% for each lepton flavor. This branching ratio 
is as high as 20% when & — >• c cascade decays are taken into account. These properties allow b-jets to be 
distinguished from light jets (u, d, s) or gluon jets and to a lesser extent c-jets. 

II. B TAGGING ALGORITHMS 

The inputs to b-tagging are particle flow jets Q, charged particle tracks and vertices, both primary and 
secondary. The jets are reconstructed by the anti-fcx clustering method, with a cone radius parameter of 
Ai?=0.5, where Ai? is defined in terms of the azimuthal angle and pseudorapidity rj as Ai? = va^?Ta^. 
The tracks are reconstructed with a Kalman Filter based method @ . The vertices are reconstructed from tracks 
compatible with the beam spot using the Adaptive Vertex Fitter algorithm The output of the b-tagging 
algorithms is a discriminator. This is a variable which is sensitive to the flavor content of the jet and is computed 
from tracks associated with the jets. The next step is to choose a working point. A loose operating point implies 
a 10% light quark fraction, while medium and tight correspond to 1% and 0.1% light quark fractions respectively. 

The algorithms for b-jet identification utilize the unique features of B hadron decays. The impact parameter 
(IP) is defined as the two dimensional or three dimensional distance between the track and the vertex at the 
point of closest approach as shown in Fig. [T] Since the uncertainty, cr/p, varies with the number of tracks, the 
preferred b-tagging variable is IP/aip . The lifetime based taggers rely on tracks with large impact parameters 
or on the presence of a reconstructed secondary vertex within a jet. Track Counting (TC) and Jet Probability 
(JP) are impact parameter based taggers. The TC discriminator is based on finding N tracks with IP/ajp > S, 
where S" is a threshold. In the high efficiency (HE) version of this tagger, the value of N is set at two, while the 
high purity (HP) tagger utilizes the first three tracks. The HP version of the tagger, hence, has a lower b-tagging 
efficiency due to the application of a stringent cut. Consequently, the mis-tag rate is also low. The JP tagger 
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combines information from all tracks and computes the probability of these tracks to come from the primary 
vertex. An alternate version of the JP tagger used in analyses is based on enhancing the b flavor content by 
associating a higher weight to the four most displaced tracks. This form of the JP tagger is analogous to a HP 
version of the tagger. The next set of b tagging algorithms involve a secondary vertex in B hadron decays. 
The simple secondary vertex (SSV) tagger is based on the reconstruction of at least one secondary vertex. The 
discriminating variable for this tagger is obtained from the significance of the 3D flight distance. SSVHE is 
obtained by associating two tracks with the vertex, while SSVHP relies on three tracks associated with the 
vertex. 

These taggers are simple taggers that do not require calibration, therefore, ideal for early data taking. In 
addition to these taggers, the complex secondary vertex tagger (CSV) is used. This tagger uses various track 
and vertex information combined through a multi-variate technique. 




primary 
Vertex 



(PV) 

FIG. 1: Definition of positive and negative impact parameters 



A. Efficiency measurement from muon-jet events : pxrei metliod 

The pTrei method utilizes semi-leptonic B hadron decays giving rise to b-jets that contain a muon ("muon 
jet"). pTrei is defined as the transverse momentum of the muon with respect to the jet direction as pictorially 
described in Fig. [2j Due to the high b quark mass, pxrei is larger for muons from B hadron decays. A sample, 
with an enhanced b-jet purity, is constructed by asking for two reconstructed jets : the muon-jet and another 
fulfilling the b-tagging criterion. The pTrei spectra for muon jets originating from 6, c and light flavor partons 
are obtained from simulations. /^"^ (Z™*"^) are defined as fractions of jets that pass (fail) the b-tagging 
requirement. From the PTrei spectra of b and non-6 (c -I- light flavor jets), these fractions are extracted with a 
maximum likelihood fit. The fractions and the total number of tagged and untagged muon jets {N^ata^ ^data^) 
are used to calculate the efficiency: £j,"^ — i^g tag .fuitH ,rur^^ag ■ The plots of the fits to the pTrei distributions 

Jb ■^data'^Jb ' -"data 

are in Fig. [3l 

B. "System 8" 

"System 8" is a data driven method with minimal dependence on simulations. SystemS, like the pTrei method, 
takes advantage of semi-leptonic B hadron decays. It is applied to a sample of muon jet events. A system of 8 
non-linear equations are set up and solved using numerical methods. Two data samples are used: 

• The muon jet+ away-jet sample : Contains two reconstructed jets and a muon within AR < 0.4 of one 
of the jets. The highest pT muon is taken when there exist more muons in the jet. If there exist two jets 
with muons in them in an event, both are counted as muon jets. 

• The muon jct+tagged-away-jet sample : This sample is created by tagging a b quark in the away jet. 
Since b quarks are produced in pairs a b quark can be tagged in the same event in another jet. 
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FIG. 2: pTrei is defined as the transverse momentum of the muon with respect to the jet direction. 




FIG. 3: Fits of the muon pxrei distributions to b and hght fiavor templates for jets containing muons that (left) pass 
or (right) fail the b-tagging algorithm: SSVHPT (Simple Secondary Vertex High Purity Tight Operating Point). The 
fractions and the total yields {N^J^^^, ^data^) used to calculate the efficiency. 

The first two equations, hence are: 

n = nfc + Uci (1) 
P = Pb+ Pel (2) 

Here, {n,p) are the muon-in-jets in each sample. 

Two different taggers are used: A test tagger ("tag") which in this case is chosen to be a lifetime based 
tagger and a cut on pxrei ■ This choice is dictated by the requirement that these taggers be minimally correlated. 

Hence the next set of equations are: 

n'-a ^el^'^'n. + sf^'n^i (3) 
p*-9 = (3i2e'rpb + ai24?V/ (4) 

Here, (n*''^,p*'^^) are lifetime tagged. 



nP^'-" =el^^^'n, + e''J^^'n,i (5) 
pPT..^ = /323£r'-"Ph + a23ePj^^'Pci (6) 



Here, (n^^''^' , pP^*""' ) are obtained by applying a cut on the pxrei distribution. 
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The last set of equations are a result of the application of both tags. 

The correlation factors are (ai2, /3i2, 023, /323, ai3, /3i3, CH23, /?123) obtained from simulations. They are 
defined as: 
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e^^^from muon jet+ tagged- away-jet sample 
e^^^from muon-jet-l- away- jet sample 



(9) 



e*?^from muon jet-|-tagged-away-jet sample , , 

"12 = tagr r— : (10) 

irom muon-jct+away-jet sample 



e^^'^^' from muon jet -f tagged-away-jet sample 

^23 = PTrcir T— r- ] (11) 

ejj from muon-jct-|- away-jet sample 



e^;^''°' from muon jet+tagged-away-jet sample 
£^r''^'from muon-jet -I- away-jet sample 
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.tag,pTr-el tag,pTr-el 

and ai3 = ,1 (13) 



^tag pTrel tag PTrcl 

for the muon jet and away-jet sample and, 

^tag.pT-rcl ^tag,pTrcl 

/^123 = 7ag ^ud ai23 = fag p^^^, (14) 

^b ^b ^cl ^cl 

for the muon jet and tagged-away-jet sample. 

These definitions are obtained by writing the left hand side of the equations in terms of a composite efficiency 
term and equating the b and c and light jet terms on each side of the equation. These correlation 

factors are the only variables that are obtained from simulations, hence, justifying the claim that this method 
is data-driven. 

C. Measured b-tagging efficiencies 

This section contains the measured b-tagging efficiencies, with the use of the pTrei and the SystemS method, 
parametrized in jet px- Table |T] contains the efficiency values along with the statistical and systematic un- 
certainty. The sources of systematic uncertainties are described in the next section. The left panel of Fig. U 
shows that there is good agreement between the two methods and also with Monte Carlo (MC) generator level 
information. However, the plot on the right panel shows considerable disagreement in the high pT region. This 
can be attributed to low statistics in high pT bins when a high purity tight operating point is used. 

In all cases, the ratio of data to MC generator level information (scale factor, SF) is calculated. The scale 
factor is a measure of the departure from ideality, hence they are expected to be close to ~ 1. The scale factors 
along with the efficiencies are used for various physics analysis involving b-jets. In Table HIl the scale factors are 
parametrized as a function of the pseudorapidity, rj. No major variation with respect to rj is observed. 
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FIG. 4: b-tagging for the TCHEL (left panel) and SSVHPT (right panel) taggers as a function of muon-jet pr- Both 
lower panels show data/MC scale factors. 



TABLE I: Measured b-tagging efhciencies and data/MC scale factors for several b-tagging algorithms. Uncertainties are 
statistical for t^^ and statistical-|-systematic for SFi,. 



b-tagger 


taq 


gptag 


taq 




50-80 GeV 


PtRel 


Ptrel 


System8 


System8 


JPL 


0.82 ± 0.01 


0.97 ± 0.01 ± 0.05 


0.85 ± 0.02 


1.00 ± 0.02 ± 0.07 


TCHEL 


0.76 ± 0.01 


0.95 ± 0.01 ± 0.05 


0.77 ± 0.01 


0.96 ± 0.02 ± 0.05 


TCHEM 


0.63 ± 0.01 


0.93 ± 0.02 ± 0.06 


0.63 ± 0.02 


0.93 ± 0.02 ± 0.07 


TCHPM 


0.48 ± 0.01 


0.92 ± 0.02 ± 0.05 


0.49 ± 0.01 


0.93 ± 0.03 ± 0.09 


SSVHEM 


0.62 ± 0.01 


0.95 ± 0.02 ± 0.07 


0.60 ± 0.01 


0.94 ± 0.02 ± 0.06 


SSVHPT 


0.38 ± 0.01 


0.89 ± 0.02 ± 0.06 


0.37 ± 0.01 


0.90 ± 0.03 ± 0.05 


TCHPT 


0.36 ± 0.01 


0.88 ± 0.02 ± 0.05 


0.37 ± 0.01 


0.88 ± 0.03 ± 0.07 



D. Systematic Uncertainties 



Several sources of systematic uncertainties were identified. Some of these were method dependent, while most 
of the systematic uncertainties are common to both methods. A pTre/-method specific systematic uncertainty 
was from the mismodeling of the light jet pTrei spectra. This was determined by constructing a collision 
data sample with the application of basic kinematic cuts and quoting the disagreement between data and 
simulations as the uncertainty. For the SystemS method, the dependence on various event topologies, was a 
source of uncertainty. Essentially, this allowed one to vary the MC parameters in the equations and obtain 
the uncertainty due to their variation. Also, the pTrei cut was changed from 0.5 to 1.2 GeV to estimate the 
uncertainty due to this requirement. The rest of the sources of systematic uncertainty discussed below are 
applicable to both methods. The average systematic uncertainty varied between 6%-7%. The contributions 
from each source of systematic uncertainty is listed in Table IIIII for the PTrei method and in Table IIVI for the 
Systems method. 

• Pile-up: The distribution of primary vertices from simulations were reweighted to match data. Systematic 
uncertainties were estimated by constructing two samples with high and low pileup regions. 

• Away jet tagger: Dependency of the away-jet tagger on btagging efficiency was obtained by changing the 
taggers and the operating points. 
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TABLE II: Measured data/MC scale factors for several b-tagging algorithms in the overall jet pr range from 20 to 240 GeV 
for pseudorapidity < 2.4, < 1.2, 1.2 < |r7| < 2.4. Uncertainties axe statistical for e^"^ and statistical+systematic 
for SFb- Both pTei and SystemS provide values compatible with each other. 

b-tagger SF*"" SF,'"" SFI"' 
20-240 GeV |7?| < 2.4 < 1.2 1.2 < \r)\ < 2.4 



JPL 0.99 ± 0.01± 0.10 0.99 ± 0.01 ± 0.10 0.98 ± 0.01± 0.10 

TCHEL 0.95 ± 0.01± 0.10 0.95 ± 0.01 ± 0.10 0.95 ± 0.02± 0.10 

TCHEM 0.94 ± 0.01± 0.09 0.94 ± 0.01 ± 0.09 0.93 ± 0.02± 0.09 

TCHPM 0.91 ± 0.01± 0.09 0.91 ± 0.02 ± 0.09 0.90 ± 0.03± 0.09 

SSVHEM 0.95 ± 0.01± 0.10 0.95 ± 0.01 ± 0.10 0.93 ± 0.02± 0.09 

SSVHPT 0.90 ± 0.02± 0.09 0.89 ± 0.02 ± 0.09 0.90 ± 0.03± 0.09 

TCHPT 0.88 ± 0.02± 0.09 0.88 ± 0.02 ± 0.09 0.87 ± 0.03± 0.09 



• Muon pt- Muon pt cut was varied from its central value at 5 GeV to 7 and 10 GeV. 

• Gluon splitting: To account for the error in mismodeling gluon to bb pairs. The number of events with 
gluon splitting was artificially changed by a factor of two to calculate this effect. 

• Closure test: The methods were checked for self-consistency. The difference between the efficiency mea- 
surement from data and simulation was quoted as the uncertainty. 



TABLE III: Sources of systematic uncertainties for the Ptrel method. 



h-tagger 


pilo-Tip 


away jot 


miioii Pi 


light 


fj bb 


JPL 


0.2% 


3.0% 


2.3% 


2.8% 


0.3% 


TCHEM 


2.4% 


3.6% 


1.5% 


3.3% 


0.2% 


TCHEM 


0.9% 


5.1% 


1.5% 


3.7% 


0.1% 


TCHPM 


1.8% 


3.3% 


2.6% 


3.4% 


0.4% 


SSVHEM 


1.4% 


5.8% 


1.9% 


3.4% 


0.6% 


SSVHPT 


1.1% 


4.8% 


2.8% 


3.4% 


0.6% 


TCliPT 


O.li'A 


1.3% 


2.3% 


:!.7% 


0.3% 



TABLE IV: Sources of systematic uncertainties for the System8 method 
b-tagger pile-up away jet muon pT pxrei 9 bb sample 



JPL 


5.1% 


1.3% 


0.8% 


2.2% 


0.1% 


3.8% 


TCHEM 


3.3% 


2.4% 


2.8% 


0.9% 


0.6% 


1.9% 


TCHEM 


5.8% 


2.6% 


0.9% 


2.0% 


0.7% 


2.4% 


TCHPM 


4.8% 


3.9% 


4.9% 


1.7% 


2.1% 


4.0% 


SSVHEM 


3.5% 


4.6% 


0.4% 


1.8% 


0.2% 


3.0% 


SSVHPT 


1.2% 


2.9% 


2.8% 


2.4% 


0.2% 


3.0% 


TCHPT 


3.5% 


3.1% 


4.0% 


2.8% 


2.5% 


2.5% 



III. CROSS-CHECKS WITH tt EVENTS 



In the standard model, t decays to Wb at least 99.8% of the time. The measurement of heavy flavor content, 
can lead to a measurement of Rb = ( B(f I^tyg] ) > where q is any down type quark. Rh, if assumed to be 1, 
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TCHE discriminator TCHP discriminator 



FIG. 5: Signed fo-tag discriminators in data (dots) and simulation of light flavor jets (blue), c-jets(green) and b-jets (red 
area) with a pr threshold of 30 GeV. 

can be used to extract the b tagging efficiency. Several methods were used for the determination of b-tagging 
efficiencies: 

• The Profile Likelihood Ratio method : This method uses dilepton tt events. The distribution of jet 
multiplicity versus b-tagged jet multiplicity in dilepton ti events is used to construct a likelihood function. 

• The Rb method : The methods also replies dilepton tt events. The observed b-tagged jets is proportional 
to the fraction of b-jets present, the proportionality factor being e^"^. The number of b-tagged jets is 
modeled probabilistically using e^"^ and e™'''*'*^ for dilepton ti events. 

• The Flavor Tag Consistency Method : lepton-|-jets ti events from top decays are used as input to this 
method. The procedure requires consistency between observed and expected number of identified jets in 
an event in ti lepton+jets decays . A dedicated likelihood function is built based on e^°^, e*°^ and emistag, 
ti cross section and acceptance obtained from simulations. 

• The Simultaneous Heavy Flavor and Top method : This method also uses lepton-fjets ti events. e^°^ is 
obtained from two-dimensional fit with the number of jets and the invariant mass of the tracks forming 
the secondary vertex. 

All of these methods give efficiency values compatible with Ptrel and SystemS methods and are also consistent 
with each other. 



IV. ESTIMATION OF MIS- TAG RATE WITH NEGATIVE TAGGERS 

The mis-tag rate is obtained from tracks with negative impact parameters or secondary vertices with negative 
decay lengths. The TC discriminators are plotted in Fig. [5] The negative IPs are ordered from the most negative 
upwards. The ordering on the positive side remains unchanged. The negative taggers are used in the same way 
as the current b-tagging algorithms. The mis-tag rate is evaluated as: e™atT^ ~ ^data-^^^^^^ ^ where e^^j^ is 
the negative tag rate in data and Rught = ^^c"^^ /^mc ratio between the light flavor mis-tag rate and 

negative tag rate of all jets in the simulation. The measured mis-tag rates are in Table |Vl The light jet scale 
factors are also included. 
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A. Systematic Uncertainties 

The following sources of systematic errors were taken into consideration: 

• b and c fractions: The b+c flavor fraction is varied in the QCD simulations and a systematic uncertainty 
is obtained on Rught (1-9%). 

• Gluon fraction: Uncertainty is extracted from comparison of simulation with data (0.2%). 

• Long lived and A decays (displaced vertices) and photon conversion and nuclear interactions (2.0%). 
QCD simulation events are re- weighted to take into account the observed yields of and A in data since 
these processes involve displaced vertices. 

• Mismeasured tracks: Spurious tracks increase the number of positive over negative tags (0.3%). 

• Sign flip: The ratio of the number of negative and positive tagged jets is computed in a muon-jet sample 
with a larger than 80% b purity (4.3%). 

• Event sample (dominant systematic): Using jets originating from different event topologies. Dominant 
systematic (10%). 

• Pile up: Uncertainty estimated in the same way as described above (0.7%). 



TABLE V: Mis-tag rate and data/MC scale factor for different b-taggers with px between 50 and 80 GeV. The statisti- 
cal-|-systematic uncertainties are quoted. 



b. .J J / mistaq\ o it?j_ £ / mistaq i mistaq\ 

-tagger mis-tag rate (e^^^^ ) bcale l^actor tor light jets (e^^^^ I^mc ) 



JPL 


0.077 ± O.OOli 0.016 


0.98 ± 0.01 ± 0.11 


TCHEL 


0.128 ± O.OOli 0.026 


1.11 ± 0.01 ± 0.12 


TCHEM 


0.0175 ± 0.0003± 0.0038 


1.21 ± 0.02 ± 0.17 


SSVHEM 


0.0144 ± 0.0003± 0.0029 


0.91 ± 0.02 ± 0.15 


SSVHPT 


0.0012 ± O.OOOli 0.0002 


0.93 ± 0.09 ± 0.12 


TCHPT 


0.0017 ± O.OOOli 0.0004 


1.21 ± 0.10 ± 0.18 



V. CONCLUSION 

Several methods have been used to obtain the tagging efficiency of b jets using an integrated luminosity of 
0.50 to 0.89 fb^^ collected by the CMS experiment in 2011. The data/MC scale factor is measured with an 
uncertainty of 10% for b jets with pT up to 240 GeV. For light flavor jets with pT up to 500 GeV the mis-tag rate 
is measured with an uncertainty of 10-20%. B-tagging efficiencies are cross checked with independent analyses 
using tt events. B tagging is of crucial importance in events with topologies involving b quarks 's']. 
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